High-depth whole genome sequencing of a large population-specific reference panel: Enhancing sensitivity, accuracy, and imputation

نویسندگان

  • Todd Lencz
  • Jin Yu
  • Cameron Palmer
  • Shai Carmi
  • Danny Ben-Avraham
  • Nir Barzilai
  • Susan Bressman
  • Ariel Darvasi
  • Judy H. Cho
  • Lorraine N. Clark
  • Zeynep H. Gümüş
  • Vijai Joseph
  • Robert Klein
  • Steven Lipkin
  • Kenneth Offit
  • Harry Ostrer
  • Laurie J. Ozelius
  • Inga Peter
  • Gil Atzmon
  • Itsik Pe’er
چکیده

Background: While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. In the present study, we sequenced at full-depth (≥30x) a moderately large (n=738) cohort of samples drawn from the Ashkenazi Jewish population across two platforms (Illumina X Ten and Complete Genomics, Inc.). We developed and refined a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Results: For samples sequenced on the Illumina X Ten platform, quality thresholds were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. The Complete Genomics, Inc. platform was more conservative (fewer variants called) compared to the Illumina platform, but also demonstrated relatively greater numbers of false positives that needed to be filtered. Quality control procedures also permitted detection of novel genome reads that are not mapped to current reference or alternate assemblies. After stringent quality control, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes. Conclusions: Our primary results demonstrate enhanced accuracy of a population-specific imputation panel relative to cosmopolitan panels, especially in the range of infrequent (<5% nonreference allele frequency) and rare (<1% non-reference allele frequency) variants that may be most critical to further progress in mapping of complex phenotypes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel

Imputing genotypes from reference panels created by whole-genome sequencing (WGS) provides a cost-effective strategy for augmenting the single-nucleotide polymorphism (SNP) content of genome-wide arrays. The UK10K Cohorts project has generated a data set of 3,781 whole genomes sequenced at low depth (average 7x), aiming to exhaustively characterize genetic variation down to 0.1% minor allele fr...

متن کامل

Estimation of genotype imputation accuracy using reference populations with varying degrees of relationship and marker density panel

Genotype imputation from low-density to high-density (SNP) chips is an important step before applying genomic selection, because denser chips can provide more reliable genomic predictions. In the current research, the accuracy of genotype imputation from low and moderate-density panels (5K and 50K) to high-density panels in the purebred and crossbred populations was assessed. The simulated popu...

متن کامل

Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population

Background Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence (WGS) data. However, sequencing thousands of individuals of interest is expensive. Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide...

متن کامل

Effect of Reference Population Size and Imputation Methods on the Accuracy of Imputation in Pure and Mixed Populations

    Imputation as a method of creating low-density chips to high-density chips has been introduced to increase the accuracy of genomic selection in animals. In the current study, to investing imputation accuracy, three populations of mixed (scenario 1), pure (scenario 2) and mixed + pure (scenario 3) were simulated using QMSim. Two methods of imputation including Beagle and Flmpute were used fo...

متن کامل

Genotype imputation accuracy with different reference panels in admixed populations

Genome-wide association studies have successfully identified common variants that are associated with complex diseases. However, the majority of genetic variants contributing to disease susceptibility are yet to be discovered. It is now widely believed that multiple rare variants are likely to be associated with complex diseases. Using custom-made chips or next-generation sequencing to uncover ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017